Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 834508 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 76.4 MiB |
| Average record size in memory | 96.0 B |
Variable types
| DateTime | 1 |
|---|---|
| Numeric | 11 |
town is highly correlated with street_name | High correlation |
flat_type is highly correlated with floor_area_sqft and 1 other fields | High correlation |
block is highly correlated with street_name and 1 other fields | High correlation |
street_name is highly correlated with town and 2 other fields | High correlation |
lease_commence_date is highly correlated with block and 2 other fields | High correlation |
remaining_lease is highly correlated with lease_commence_date | High correlation |
floor_area_sqft is highly correlated with flat_type and 1 other fields | High correlation |
price_psqft is highly correlated with resale_price | High correlation |
resale_price is highly correlated with flat_type and 2 other fields | High correlation |
town is highly correlated with street_name | High correlation |
flat_type is highly correlated with floor_area_sqft and 1 other fields | High correlation |
street_name is highly correlated with town and 2 other fields | High correlation |
lease_commence_date is highly correlated with street_name and 1 other fields | High correlation |
remaining_lease is highly correlated with street_name and 1 other fields | High correlation |
floor_area_sqft is highly correlated with flat_type and 1 other fields | High correlation |
price_psqft is highly correlated with resale_price | High correlation |
resale_price is highly correlated with flat_type and 2 other fields | High correlation |
town is highly correlated with street_name | High correlation |
flat_type is highly correlated with floor_area_sqft and 1 other fields | High correlation |
street_name is highly correlated with town and 1 other fields | High correlation |
lease_commence_date is highly correlated with street_name | High correlation |
floor_area_sqft is highly correlated with flat_type | High correlation |
price_psqft is highly correlated with resale_price | High correlation |
resale_price is highly correlated with flat_type and 1 other fields | High correlation |
block is highly correlated with street_name and 2 other fields | High correlation |
remaining_lease is highly correlated with street_name and 1 other fields | High correlation |
street_name is highly correlated with block and 5 other fields | High correlation |
town is highly correlated with street_name and 1 other fields | High correlation |
flat_type is highly correlated with resale_price and 2 other fields | High correlation |
storey_range is highly correlated with price_psqft | High correlation |
resale_price is highly correlated with flat_type and 4 other fields | High correlation |
lease_commence_date is highly correlated with block and 7 other fields | High correlation |
flat_model is highly correlated with block and 6 other fields | High correlation |
price_psqft is highly correlated with storey_range and 3 other fields | High correlation |
floor_area_sqft is highly correlated with street_name and 4 other fields | High correlation |
town has 47951 (5.7%) zeros | Zeros |
storey_range has 161366 (19.3%) zeros | Zeros |
flat_model has 218654 (26.2%) zeros | Zeros |
lease_commence_date has 19302 (2.3%) zeros | Zeros |
Reproduction
| Analysis started | 2021-07-16 06:01:56.749140 |
|---|---|
| Analysis finished | 2021-07-16 06:03:44.488183 |
| Duration | 1 minute and 47.74 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
date
Date
| Distinct | 378 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.4 MiB |
| Minimum | 1990-01-01 00:00:00 |
|---|---|
| Maximum | 2021-06-01 00:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 27 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.53692116 |
| Minimum | 0 |
|---|---|
| Maximum | 26 |
| Zeros | 47951 |
| Zeros (%) | 5.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 6 |
| median | 12 |
| Q3 | 19 |
| 95-th percentile | 25 |
| Maximum | 26 |
| Range | 26 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 7.772458369 |
|---|---|
| Coefficient of variation (CV) | 0.6199654822 |
| Kurtosis | -1.200443893 |
| Mean | 12.53692116 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -0.1259931216 |
| Sum | 10462161 |
| Variance | 60.4111091 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=27)
| Value | Count | Frequency (%) |
| 18 | 73779 | 8.8% |
| 21 | 63621 | 7.6% |
| 1 | 61390 | 7.4% |
| 12 | 60808 | 7.3% |
| 20 | 58882 | 7.1% |
| 0 | 47951 | 5.7% |
| 10 | 46038 | 5.5% |
| 3 | 40429 | 4.8% |
| 7 | 33964 | 4.1% |
| 4 | 30752 | 3.7% |
| Other values (17) | 316894 |
| Value | Count | Frequency (%) |
| 0 | 47951 | |
| 1 | 61390 | |
| 2 | 19823 | 2.4% |
| 3 | 40429 | |
| 4 | 30752 | |
| 5 | 2318 | 0.3% |
| 6 | 6451 | 0.8% |
| 7 | 33964 | |
| 8 | 25737 | |
| 9 | 25810 |
| Value | Count | Frequency (%) |
| 26 | 13990 | 1.7% |
| 25 | 30610 | |
| 24 | 24750 | 3.0% |
| 23 | 11223 | 1.3% |
| 22 | 62 | < 0.1% |
| 21 | 63621 | |
| 20 | 58882 | |
| 19 | 28537 | 3.4% |
| 18 | 73779 | |
| 17 | 21307 | 2.6% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.137370762 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 1106 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.124325711 |
|---|---|
| Coefficient of variation (CV) | 0.5260321374 |
| Kurtosis | 0.8133138708 |
| Mean | 2.137370762 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.088219248 |
| Sum | 1783653 |
| Variance | 1.264108303 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 2 | 314371 | |
| 1 | 270536 | |
| 3 | 174358 | |
| 5 | 63729 | 7.6% |
| 4 | 9896 | 1.2% |
| 0 | 1106 | 0.1% |
| 6 | 512 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1106 | 0.1% |
| 1 | 270536 | |
| 2 | 314371 | |
| 3 | 174358 | |
| 4 | 9896 | 1.2% |
| 5 | 63729 | 7.6% |
| 6 | 512 | 0.1% |
| Value | Count | Frequency (%) |
| 6 | 512 | 0.1% |
| 5 | 63729 | 7.6% |
| 4 | 9896 | 1.2% |
| 3 | 174358 | |
| 2 | 314371 | |
| 1 | 270536 | |
| 0 | 1106 | 0.1% |
| Distinct | 2529 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 467.8094626 |
| Minimum | 0 |
|---|---|
| Maximum | 2528 |
| Zeros | 1325 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 25 |
| Q1 | 166 |
| median | 330 |
| Q3 | 597 |
| 95-th percentile | 1561 |
| Maximum | 2528 |
| Range | 2528 |
| Interquartile range (IQR) | 431 |
Descriptive statistics
| Standard deviation | 456.5613758 |
|---|---|
| Coefficient of variation (CV) | 0.9759558374 |
| Kurtosis | 3.79556914 |
| Mean | 467.8094626 |
| Median Absolute Deviation (MAD) | 197 |
| Skewness | 1.915130896 |
| Sum | 390390739 |
| Variance | 208448.2899 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 125 | 4272 | 0.5% |
| 209 | 3740 | 0.4% |
| 115 | 3169 | 0.4% |
| 17 | 3167 | 0.4% |
| 229 | 3087 | 0.4% |
| 238 | 3079 | 0.4% |
| 173 | 3058 | 0.4% |
| 133 | 2994 | 0.4% |
| 256 | 2974 | 0.4% |
| 266 | 2959 | 0.4% |
| Other values (2519) | 802009 |
| Value | Count | Frequency (%) |
| 0 | 1325 | |
| 1 | 1471 | |
| 2 | 2184 | |
| 3 | 1830 | |
| 4 | 1323 | |
| 5 | 1636 | |
| 6 | 1512 | |
| 7 | 1214 | |
| 8 | 1661 | |
| 9 | 1295 |
| Value | Count | Frequency (%) |
| 2528 | 1 | < 0.1% |
| 2527 | 1 | < 0.1% |
| 2526 | 3 | |
| 2525 | 1 | < 0.1% |
| 2524 | 2 | < 0.1% |
| 2523 | 1 | < 0.1% |
| 2522 | 1 | < 0.1% |
| 2521 | 1 | < 0.1% |
| 2520 | 1 | < 0.1% |
| 2519 | 7 |
| Distinct | 572 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 229.8856787 |
| Minimum | 0 |
|---|---|
| Maximum | 571 |
| Zeros | 4631 |
| Zeros (%) | 0.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 110 |
| median | 213 |
| Q3 | 355 |
| 95-th percentile | 488 |
| Maximum | 571 |
| Range | 571 |
| Interquartile range (IQR) | 245 |
Descriptive statistics
| Standard deviation | 153.7811131 |
|---|---|
| Coefficient of variation (CV) | 0.6689460343 |
| Kurtosis | -1.040854138 |
| Mean | 229.8856787 |
| Median Absolute Deviation (MAD) | 122 |
| Skewness | 0.2124285996 |
| Sum | 191841438 |
| Variance | 23648.63076 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 218 | 16232 | 1.9% |
| 9 | 13735 | 1.6% |
| 3 | 12836 | 1.5% |
| 1 | 11277 | 1.4% |
| 273 | 8698 | 1.0% |
| 185 | 7758 | 0.9% |
| 10 | 6960 | 0.8% |
| 13 | 6894 | 0.8% |
| 2 | 6736 | 0.8% |
| 212 | 6174 | 0.7% |
| Other values (562) | 737208 |
| Value | Count | Frequency (%) |
| 0 | 4631 | 0.6% |
| 1 | 11277 | |
| 2 | 6736 | |
| 3 | 12836 | |
| 4 | 5960 | |
| 5 | 1628 | 0.2% |
| 6 | 2337 | 0.3% |
| 7 | 782 | 0.1% |
| 8 | 270 | < 0.1% |
| 9 | 13735 |
| Value | Count | Frequency (%) |
| 571 | 19 | < 0.1% |
| 570 | 9 | < 0.1% |
| 569 | 62 | < 0.1% |
| 568 | 51 | < 0.1% |
| 567 | 74 | < 0.1% |
| 566 | 57 | < 0.1% |
| 565 | 21 | < 0.1% |
| 564 | 123 | |
| 563 | 291 | |
| 562 | 25 | < 0.1% |
| Distinct | 25 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.016930934 |
| Minimum | 0 |
|---|---|
| Maximum | 24 |
| Zeros | 161366 |
| Zeros (%) | 19.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 24 |
| Range | 24 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.986307884 |
|---|---|
| Coefficient of variation (CV) | 0.9848170062 |
| Kurtosis | 15.8558031 |
| Mean | 2.016930934 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 2.967187697 |
| Sum | 1683145 |
| Variance | 3.945419011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=25)
| Value | Count | Frequency (%) |
| 1 | 210886 | |
| 2 | 190110 | |
| 3 | 170456 | |
| 0 | 161366 | |
| 4 | 53591 | 6.4% |
| 6 | 20203 | 2.4% |
| 5 | 9726 | 1.2% |
| 8 | 6302 | 0.8% |
| 7 | 2696 | 0.3% |
| 15 | 2689 | 0.3% |
| Other values (15) | 6483 | 0.8% |
| Value | Count | Frequency (%) |
| 0 | 161366 | |
| 1 | 210886 | |
| 2 | 190110 | |
| 3 | 170456 | |
| 4 | 53591 | 6.4% |
| 5 | 9726 | 1.2% |
| 6 | 20203 | 2.4% |
| 7 | 2696 | 0.3% |
| 8 | 6302 | 0.8% |
| 9 | 1142 | 0.1% |
| Value | Count | Frequency (%) |
| 24 | 11 | < 0.1% |
| 23 | 32 | < 0.1% |
| 22 | 30 | < 0.1% |
| 21 | 2 | < 0.1% |
| 20 | 7 | < 0.1% |
| 19 | 39 | < 0.1% |
| 18 | 92 | < 0.1% |
| 17 | 265 | < 0.1% |
| 16 | 1254 | |
| 15 | 2689 |
| Distinct | 20 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.471407105 |
| Minimum | 0 |
|---|---|
| Maximum | 19 |
| Zeros | 218654 |
| Zeros (%) | 26.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 2 |
| 95-th percentile | 13 |
| Maximum | 19 |
| Range | 19 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 3.305829455 |
|---|---|
| Coefficient of variation (CV) | 1.337630473 |
| Kurtosis | 5.227874159 |
| Mean | 2.471407105 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 2.330061455 |
| Sum | 2062409 |
| Variance | 10.92850838 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=20)
| Value | Count | Frequency (%) |
| 2 | 232339 | |
| 0 | 218654 | |
| 1 | 176515 | |
| 4 | 53893 | 6.5% |
| 3 | 39888 | 4.8% |
| 13 | 37151 | 4.5% |
| 6 | 32535 | 3.9% |
| 7 | 27314 | 3.3% |
| 15 | 9131 | 1.1% |
| 16 | 2106 | 0.3% |
| Other values (10) | 4982 | 0.6% |
| Value | Count | Frequency (%) |
| 0 | 218654 | |
| 1 | 176515 | |
| 2 | 232339 | |
| 3 | 39888 | 4.8% |
| 4 | 53893 | 6.5% |
| 5 | 1919 | 0.2% |
| 6 | 32535 | 3.9% |
| 7 | 27314 | 3.3% |
| 8 | 657 | 0.1% |
| 9 | 43 | < 0.1% |
| Value | Count | Frequency (%) |
| 19 | 65 | < 0.1% |
| 18 | 153 | < 0.1% |
| 17 | 311 | < 0.1% |
| 16 | 2106 | 0.3% |
| 15 | 9131 | 1.1% |
| 14 | 83 | < 0.1% |
| 13 | 37151 | |
| 12 | 1125 | 0.1% |
| 11 | 512 | 0.1% |
| 10 | 114 | < 0.1% |
lease_commence_date
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 54 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 17.00531451 |
| Minimum | 0 |
|---|---|
| Maximum | 53 |
| Zeros | 19302 |
| Zeros (%) | 2.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 6 |
| median | 16 |
| Q3 | 28 |
| 95-th percentile | 37 |
| Maximum | 53 |
| Range | 53 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 12.49658112 |
|---|---|
| Coefficient of variation (CV) | 0.7348632752 |
| Kurtosis | -0.7959343856 |
| Mean | 17.00531451 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 0.4680313186 |
| Sum | 14191071 |
| Variance | 156.1645396 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6 | 82157 | 9.8% |
| 4 | 59529 | 7.1% |
| 20 | 47501 | 5.7% |
| 21 | 40170 | 4.8% |
| 2 | 38465 | 4.6% |
| 9 | 36126 | 4.3% |
| 22 | 30249 | 3.6% |
| 5 | 29897 | 3.6% |
| 3 | 28743 | 3.4% |
| 30 | 28462 | 3.4% |
| Other values (44) | 413209 |
| Value | Count | Frequency (%) |
| 0 | 19302 | 2.3% |
| 1 | 19853 | 2.4% |
| 2 | 38465 | |
| 3 | 28743 | 3.4% |
| 4 | 59529 | |
| 5 | 29897 | 3.6% |
| 6 | 82157 | |
| 7 | 19359 | 2.3% |
| 8 | 12452 | 1.5% |
| 9 | 36126 |
| Value | Count | Frequency (%) |
| 53 | 10 | < 0.1% |
| 52 | 14 | < 0.1% |
| 51 | 759 | 0.1% |
| 50 | 3041 | |
| 49 | 6174 | |
| 48 | 2302 | 0.3% |
| 47 | 3794 | |
| 46 | 1980 | 0.2% |
| 45 | 3558 | |
| 44 | 1056 | 0.1% |
| Distinct | 53 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 80.78900742 |
| Minimum | 44 |
|---|---|
| Maximum | 96 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 44 |
|---|---|
| 5-th percentile | 62 |
| Q1 | 74 |
| median | 82 |
| Q3 | 89 |
| 95-th percentile | 94 |
| Maximum | 96 |
| Range | 52 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 9.988283617 |
|---|---|
| Coefficient of variation (CV) | 0.1236341915 |
| Kurtosis | -0.1912553966 |
| Mean | 80.78900742 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -0.6688467654 |
| Sum | 67419073 |
| Variance | 99.76580961 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 94 | 46878 | 5.6% |
| 93 | 42917 | 5.1% |
| 86 | 34301 | 4.1% |
| 92 | 33152 | 4.0% |
| 85 | 32868 | 3.9% |
| 84 | 32659 | 3.9% |
| 88 | 32110 | 3.8% |
| 87 | 31725 | 3.8% |
| 82 | 30697 | 3.7% |
| 90 | 30684 | 3.7% |
| Other values (43) | 486517 |
| Value | Count | Frequency (%) |
| 44 | 46 | < 0.1% |
| 45 | 132 | < 0.1% |
| 46 | 218 | < 0.1% |
| 47 | 334 | < 0.1% |
| 48 | 468 | 0.1% |
| 49 | 603 | |
| 50 | 691 | |
| 51 | 887 | |
| 52 | 1078 | |
| 53 | 1482 |
| Value | Count | Frequency (%) |
| 96 | 548 | 0.1% |
| 95 | 8753 | 1.0% |
| 94 | 46878 | |
| 93 | 42917 | |
| 92 | 33152 | |
| 91 | 28212 | |
| 90 | 30684 | |
| 89 | 30077 | |
| 88 | 32110 | |
| 87 | 31725 |
floor_area_sqft
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 209 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1031.196745 |
| Minimum | 301.3892 |
|---|---|
| Maximum | 3304.5173 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 301.3892 |
|---|---|
| 5-th percentile | 645.834 |
| Q1 | 785.7647 |
| median | 1001.0427 |
| Q3 | 1227.0846 |
| 95-th percentile | 1560.7655 |
| Maximum | 3304.5173 |
| Range | 3003.1281 |
| Interquartile range (IQR) | 441.3199 |
Descriptive statistics
| Standard deviation | 279.6785057 |
|---|---|
| Coefficient of variation (CV) | 0.2712174056 |
| Kurtosis | -0.374609383 |
| Mean | 1031.196745 |
| Median Absolute Deviation (MAD) | 215.278 |
| Skewness | 0.370672141 |
| Sum | 860541933.1 |
| Variance | 78220.06657 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 721.1813 | 62968 | 7.5% |
| 1119.4456 | 44166 | 5.3% |
| 731.9452 | 35027 | 4.2% |
| 904.1676 | 33809 | 4.1% |
| 1302.4319 | 27377 | 3.3% |
| 785.7647 | 26287 | 3.1% |
| 990.2788 | 24921 | 3.0% |
| 1108.6817 | 24777 | 3.0% |
| 979.5149 | 24776 | 3.0% |
| 699.6535 | 24287 | 2.9% |
| Other values (199) | 506113 |
| Value | Count | Frequency (%) |
| 301.3892 | 31 | < 0.1% |
| 312.1531 | 351 | |
| 333.6809 | 724 | |
| 365.9726 | 67 | < 0.1% |
| 376.7365 | 21 | < 0.1% |
| 398.2643 | 11 | < 0.1% |
| 409.0282 | 134 | < 0.1% |
| 419.7921 | 137 | < 0.1% |
| 430.556 | 697 | |
| 441.3199 | 351 |
| Value | Count | Frequency (%) |
| 3304.5173 | 1 | < 0.1% |
| 3196.8783 | 2 | < 0.1% |
| 3013.892 | 4 | < 0.1% |
| 2863.1974 | 4 | < 0.1% |
| 2809.3779 | 6 | < 0.1% |
| 2787.8501 | 2 | < 0.1% |
| 2690.975 | 3 | < 0.1% |
| 2680.2111 | 3 | < 0.1% |
| 2647.9194 | 2 | < 0.1% |
| 2615.6277 | 16 |
| Distinct | 68088 |
|---|---|
| Distinct (%) | 8.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 286.3687721 |
| Minimum | 14.98437579 |
|---|---|
| Maximum | 1185.651721 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 14.98437579 |
|---|---|
| 5-th percentile | 107.4912247 |
| Q1 | 208.6747226 |
| median | 261.2900529 |
| Q3 | 357.3197304 |
| 95-th percentile | 499.1809965 |
| Maximum | 1185.651721 |
| Range | 1170.667345 |
| Interquartile range (IQR) | 148.6450079 |
Descriptive statistics
| Standard deviation | 121.6077674 |
|---|---|
| Coefficient of variation (CV) | 0.4246544289 |
| Kurtosis | 2.045085612 |
| Mean | 286.3687721 |
| Median Absolute Deviation (MAD) | 66.63587592 |
| Skewness | 0.9573344191 |
| Sum | 238977031.3 |
| Variance | 14788.44909 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 232.2578248 | 3075 | 0.4% |
| 185.8062598 | 2723 | 0.3% |
| 371.6125196 | 1897 | 0.2% |
| 278.7093897 | 1780 | 0.2% |
| 464.5156495 | 1670 | 0.2% |
| 232.2578248 | 1423 | 0.2% |
| 309.6770997 | 1418 | 0.2% |
| 214.3918382 | 1183 | 0.1% |
| 65.17085232 | 1143 | 0.1% |
| 348.3867371 | 1140 | 0.1% |
| Other values (68078) | 817056 |
| Value | Count | Frequency (%) |
| 14.98437579 | 1 | < 0.1% |
| 16.78250089 | 1 | < 0.1% |
| 17.0821884 | 1 | < 0.1% |
| 17.38187592 | 1 | < 0.1% |
| 17.98125095 | 3 | |
| 19.22133722 | 1 | < 0.1% |
| 20.07906356 | 1 | < 0.1% |
| 20.97812611 | 6 | |
| 21.41933273 | 1 | < 0.1% |
| 21.87718866 | 5 |
| Value | Count | Frequency (%) |
| 1185.651721 | 1 | |
| 1104.220058 | 1 | |
| 1097.047598 | 1 | |
| 1095.279005 | 1 | |
| 1092.262967 | 1 | |
| 1090.862558 | 1 | |
| 1079.031144 | 1 | |
| 1076.87714 | 1 | |
| 1075.720452 | 1 | |
| 1072.339319 | 1 |
| Distinct | 8642 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 299175.2295 |
| Minimum | 5000 |
|---|---|
| Maximum | 1268000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.4 MiB |
Quantile statistics
| Minimum | 5000 |
|---|---|
| 5-th percentile | 87000 |
| Q1 | 185000 |
| median | 280000 |
| Q3 | 390000 |
| 95-th percentile | 573000 |
| Maximum | 1268000 |
| Range | 1263000 |
| Interquartile range (IQR) | 205000 |
Descriptive statistics
| Standard deviation | 151875.4302 |
|---|---|
| Coefficient of variation (CV) | 0.5076470752 |
| Kurtosis | 1.123293377 |
| Mean | 299175.2295 |
| Median Absolute Deviation (MAD) | 100388 |
| Skewness | 0.8392258603 |
| Sum | 2.496641224 × 1011 |
| Variance | 2.30661463 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 300000 | 6443 | 0.8% |
| 280000 | 6274 | 0.8% |
| 350000 | 6113 | 0.7% |
| 250000 | 6102 | 0.7% |
| 320000 | 6027 | 0.7% |
| 260000 | 5799 | 0.7% |
| 270000 | 5599 | 0.7% |
| 400000 | 5564 | 0.7% |
| 360000 | 5541 | 0.7% |
| 380000 | 5510 | 0.7% |
| Other values (8632) | 775536 |
| Value | Count | Frequency (%) |
| 5000 | 1 | < 0.1% |
| 5600 | 1 | < 0.1% |
| 5700 | 1 | < 0.1% |
| 5800 | 1 | < 0.1% |
| 6000 | 4 | < 0.1% |
| 6700 | 1 | < 0.1% |
| 7000 | 8 | |
| 7300 | 19 | |
| 7500 | 13 | |
| 7600 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1268000 | 1 | < 0.1% |
| 1258000 | 1 | < 0.1% |
| 1250000 | 1 | < 0.1% |
| 1248000 | 1 | < 0.1% |
| 1238800 | 1 | < 0.1% |
| 1232000 | 1 | < 0.1% |
| 1220000 | 1 | < 0.1% |
| 1218888 | 1 | < 0.1% |
| 1210000 | 3 | |
| 1208000 | 2 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| date | town | flat_type | block | street_name | storey_range | flat_model | lease_commence_date | remaining_lease | floor_area_sqft | price_psqft | resale_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1990-01-01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 86 | 333.6809 | 26.971876 | 9000.0 |
| 1 | 1990-01-01 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 86 | 333.6809 | 17.981251 | 6000.0 |
| 2 | 1990-01-01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 86 | 333.6809 | 23.975001 | 8000.0 |
| 3 | 1990-01-01 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 86 | 333.6809 | 17.981251 | 6000.0 |
| 4 | 1990-01-01 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 84 | 785.7647 | 60.068873 | 47200.0 |
| 5 | 1990-01-01 | 0 | 1 | 2 | 1 | 3 | 1 | 0 | 86 | 721.1813 | 63.784238 | 46000.0 |
| 6 | 1990-01-01 | 0 | 1 | 3 | 1 | 2 | 1 | 0 | 86 | 721.1813 | 58.237783 | 42000.0 |
| 7 | 1990-01-01 | 0 | 1 | 4 | 1 | 0 | 1 | 0 | 86 | 721.1813 | 52.691327 | 38000.0 |
| 8 | 1990-01-01 | 0 | 1 | 4 | 1 | 1 | 1 | 0 | 86 | 721.1813 | 55.464555 | 40000.0 |
| 9 | 1990-01-01 | 0 | 1 | 5 | 1 | 3 | 1 | 0 | 86 | 721.1813 | 65.170852 | 47000.0 |
Last rows
| date | town | flat_type | block | street_name | storey_range | flat_model | lease_commence_date | remaining_lease | floor_area_sqft | price_psqft | resale_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 834498 | 2021-06-01 | 21 | 3 | 705 | 218 | 2 | 0 | 21 | 64 | 1302.4319 | 410.770037 | 535000.0 |
| 834499 | 2021-06-01 | 21 | 3 | 80 | 222 | 0 | 0 | 6 | 62 | 1345.4875 | 401.341521 | 540000.0 |
| 834500 | 2021-06-01 | 21 | 3 | 444 | 223 | 3 | 0 | 21 | 64 | 1313.1958 | 318.983658 | 418888.0 |
| 834501 | 2021-06-01 | 21 | 3 | 2516 | 558 | 2 | 0 | 51 | 94 | 1216.3207 | 487.535894 | 593000.0 |
| 834502 | 2021-06-01 | 21 | 3 | 2391 | 558 | 0 | 0 | 50 | 93 | 1216.3207 | 476.847923 | 580000.0 |
| 834503 | 2021-06-01 | 21 | 3 | 2375 | 558 | 0 | 0 | 50 | 93 | 1205.5568 | 510.137722 | 615000.0 |
| 834504 | 2021-06-01 | 21 | 5 | 278 | 258 | 3 | 6 | 25 | 69 | 1948.2659 | 445.524402 | 868000.0 |
| 834505 | 2021-06-01 | 21 | 5 | 451 | 223 | 3 | 7 | 6 | 62 | 1636.1128 | 357.554809 | 585000.0 |
| 834506 | 2021-06-01 | 21 | 5 | 68 | 287 | 3 | 7 | 21 | 64 | 1571.5294 | 381.793685 | 600000.0 |
| 834507 | 2021-06-01 | 21 | 5 | 817 | 303 | 1 | 7 | 20 | 65 | 1571.5294 | 454.970807 | 715000.0 |